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ABSTRACT 

We report alterations to the murine leukemia virus 
(MLV) integrase (IN) protein that successfully result 
in decreasing its integration frequency at transcrip- 
tion start sites and CpG islands, thereby reducing the 
potential for insertional activation. The host bromo 
and extraterminal (BET) proteins Brd2, 3 and 4 in- 
teract with the MLV IN protein primarily through the 
BET protein ET domain. Using solution NMR, protein 
interaction studies, and next generation sequencing, 
we show that the C-terminal tail peptide region of MLV 
IN is important for the interaction with BET proteins 
and that disruption of this interaction through trun- 
cation mutations affects the global targeting profile 
of MLV vectors. The use of the unstructured tails of 
gammaretroviral INs to direct association with com- 
plexes at active promoters parallels that used by his- 
tones and RNA polymerase II. Viruses bearing MLV IN 
C-terminal truncations can provide new avenues to 
improve the safety profile of gammaretroviral vectors 
for human gene therapy. 



INTRODUCTION 

Retroviruses have been used as an important tool in devel- 
oping gene therapy vectors. Their ability to stably integrate 
genetic information into the host genome has enabled the 
exploitation of these viruses for many gene delivery appli- 
cations. Gammaretroviral vectors have been used success- 
fully to rectify defects of SCID-X1 and other diseases (1). 
However, despite the efficiency in gene delivery, insertional 
mutagenesis can result in clonal expansion of cells bearing 
specific integrants (2), associated with the preferential inte- 
gration of murine leukemia virus (MLV) vectors upstream 
of transcription start sites (TSS) and CpG islands near pro- 
moter regions (3). This complicates their use in gene ther- 
apy. 

In the retroviral replication cycle, the viral reverse tran- 
scriptase enzyme converts the single-stranded RNA viral 
genome into double-stranded DNA, which is associated 
within a preintegration complex (PIC). MLV requires cells 
to undergo mitosis. The viral pl2 protein, which is part of 
the PIC, is responsible for tethering the viral genome to 
the host mitotic chromatin (4-5). However, the pl2 protein 
does not mediate targeting of the viral PIC toward genomic 
hotspots for retroviral integration such as TSS and CpG is- 
lands (5). The viral integrase (IN), upon entry into the host 
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nucleus, mediates the integration of the viral DNA into the 
host genome (6). The viral IN protein is the primary viral 
determinant for target-site selection (7). 

It has recently been shown that the host bromo and ex- 
traterminal (BET) domain proteins Brd 2, 3 and 4 bind 
the viral IN protein through their conserved ET domain 
(8-10). The down regulation of BET proteins with siR- 
NAs (8-9) as well as treatment with a small molecule in- 
hibitor JQ1, which selectively impairs BET protein associ- 
ation with chromatin, showed decrease in preferential in- 
tegration targeting at TSS and CpG islands (8-10). In the 
presence of LEDGF-BET protein chimeras (10), integra- 
tion can be shifted toward LEDGF binding sites. In vitro 
interaction studies and coimmunoprecipitation of overex- 
pressed MLV IN in mammalian cells have mapped the BET 
binding sites to different domains of MLV IN including the 
catalytic core domain (CCD) (9), the C-terminal domain 
(CTD) (8,10) and the IN C-terminus (10). 

In this report, we demonstrate that the C-terminal 
polypeptide segment of the viral IN protein, which we refer 
to as the tail peptide (TP), is a key determinant in mediating 
the interaction of the viral IN protein with the ET domain 
of the BET proteins. This interaction provides a structural 
basis for global in vivo integration-site preferences. MLV 
virus bearing IN lacking this C-terminal 28-residue TP are 
viable in tissue culture (11-12) and in vitro (13-14). Hence, 
deletion of the TP does not disrupt the catalytic properties 
of IN. MLV IN lacking the TP lose their interaction with 
BET proteins, thus presenting a direct mechanism to alter 
target-site utilization. Virus bearing IN lacking the TP, or 
with it replaced with other peptides, exhibits markedly di- 
minished viral integrations in mammalian cells near TSS, 
CpG islands, and known BET binding sites. 

MATERIAL AND METHODS 

Plasmid constructs 

INi_385 xn (previously named IN i«6215a (11)) is an infec- 
tious M-MLV clone in which a NotI restriction site was 
inserted at an Xbal site and results in premature trunca- 
tion of the IN protein at IN position 385. Insertions into 
the M-MLV infectious clone expressing IN ;'«6215a ((11); 
INi_385 xn) were performed at the NotI site using oligonu- 
cleotides, as described in the Supplementary Data. The 
MLV IN CTD was expressed in the bacterial pET15_NESG 
vector (15) at the Ndel and BamHI sites. Details of the glu- 
tathione S-transferase (GST)-IN and Brd3 ET constructs 
along with the cloning protocols are provided in the Sup- 
plementary Data. 

Protein purification for NMR studies 

Protein expression and purification from the pET-based 
construct for MLV IN CTD was performed as previously 
described (15-18) with the following modifications: protein 
expression was induced with 1 mM IPTG at 17°C for 25 
h. Induction was carried out in MJ9 media (17-18) in the 
presence of either 15 N -labeled ammonium chloride or 15 N 
ammonium chloride plus uniformly 13 C-enriched glucose. 
Following Ni-NTA resin purification (Qiagen) as per man- 
ufacturer's instructions, fractions eluted in 400 mM imida- 



zole were pooled and concentrated to a volume of less than 
250 |xL using an Amicon Ultracel-3K centrifugal filter unit 
(Millipore). The concentrated protein fraction was then in- 
jected into an AKTA FPLC and resolved on a Superdex 
75 gel filtration column (GE Healthcare) in 20 mM sodium 
phosphate pH 8.0, 300 mM NaCl, 50 mM potassium glu- 
tamate (KCsHgNC^) and 5 mM 2-mercaptoethanol. The 
eluted fractions were then pooled and concentrated using 
an Amicon Ultracel-3K centrifugal filter unit (Millipore). 
All isotopes were purchased from Cambridge Isotopes Lab- 
oratories. 

Next generation sequencing of MLV IN C-terminal trunca- 
tions 

Sequencing was performed exactly as described before (5). 
Analysis of integration sites near Brd2, 3 and 4 binding 
sites were correlated using ChlP-seq data (19) and analyzed 
as described previously (8). The statistical test rely on the 
variance-covariance matrix of the relative ranks of the inte- 
gration sites to construct Wald-type test statistics and re- 
ferred to the Chi Square distribution to obtain P values 
(20). Datasets used in the analysis: the inset box defines 
the data sets used in the analysis; FV fibroblast (21), HIV- 
1 (22), MLV ((5) and this work), WT MLV INmo8, MLV 
INi_ 385 8N, MLV INi_ 385 16H, MLV INl_ 3 85 xn f>6215a (11), 
(Supplementary Table S4). 

NMR analysis of MLV CTD structure and ET interactions 

Single- ( 15 N) and double-( 13 C, 15 N) enriched MLV IN CTD 
protein samples for studies of complex formation were con- 
centrated to approximately 200 |xM-l mM concentration 
and where indicated, mixed with 2 mM unlabeled Brd3 ET 
in a buffer containing 5% 2 H 2 0, 50 mM DSS, 300 mM 
NaCl, 50 mM potassium glutamate, 25 mM sodium phos- 
phate at pH 7.0 or pH 8.0 and 5 mM 2-mercaptoethanol. 
Samples for NMR studies at pH 6.5 were prepared as de- 
scribed for pH 8.0, except at 100 mM NaCl and in the 
absence of 2-mercaptoethanol. Sequence-specific backbone 
'H, 13 C and 15 N resonance assignments for free and ET- 
bound IN 3 29^o8 were determined at pH 7.0 and 8.0 us- 
ing standard triple NMR resonance experiments (23). Res- 
onance assignments and the solution NMR structure de- 
termination of 0^329-^408 at pH 6.5 are reported elsewhere 
(PDB ID: 2M9U, BMRB ID: 19299). All spectra were 
recorded using a Bruker Avance 800 MHz spectrometer at 
25°C. NMR data were processed using NMRPipe (24) and 
SPARKY (T. D. Goddard and D. G. Kneller, SPARKY 3, 
University of California, San Francisco). The oligomeric 
states of both free and complexed proteins (1:1 molar ra- 
tio of IN CTD: Brd 3 ET at 100 jjlM or 200 uM) were 
assessed by measurements of rotational correlation times 
computed from 15 N Ti and T 2 nuclear relaxation measure- 
ments, as described previously (25-26). Detailed protocols 
and calibration data for molecular correlation time mea- 
surements based on 15 N nuclear relaxation rate data are 
provided online at (http://www.nmr2.buffalo.edu/nesg.wiki/ 
Main_Page). Peptides used for the TP competition assay 
are 'WT TP' — SRLTWRVQRSQNPLKIRLTREAP; and 
'mutant TP' — SRLTARVQRSQNAAAIALTREAP (Pep- 
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tide 2.0 Inc.)- The peptides were solubilized in deionized 
water to a final concentration of about 3 mM. 

In vitro pull down assays 

Pull-down assays were performed as described (8). 
Protein interaction trap assay 

The protein interaction trap assay (27) was adapted and per- 
formed as described in Supplementary Data. 

Data deposition 

The sequences reported in this paper have been de- 
posited in the National Center for Biotechnology Infor- 
mation Sequence Read Archive (project accession number 
SRP021184). 

RESULTS 

The structure of the MLV IN CTD changes in the presence 
of the Brd3 ET domain 

The MLV IN interacts through its CTD with the BET fam- 
ily members through the ET domain (8). We have deter- 
mined the three-dimensional structure of the MLV IN CTD 
(28) (PDB ID 2M9U) using conventional triple -resonance 
solution-state NMR methods. It consists of an SH3 fold fol- 
lowed by a long unstructured tail (Figure 1A, B and C). 
The solution structure and NMR resonance assignments 
(BMRB ID 19299) of the IN CTD provide unique tools to 
characterize IN -BET protein interactions. 

Changes in the structural environment of amino-acid 
residues within the IN CTD upon complex formation with 
the ET domain of the BET protein can be monitored using 
the chemical shifts of backbone and side chain NMR reso- 
nances. Changes include both chemical shift changes, and 
changes in amide resonance intensities due to altered 
exchange rates with solvent water protons. Both of these ef- 
fects upon complex formation are referred to in this study 
as chemical shift perturbations (CSPs). Using standard 2D 
[ 15 N-'H]-hetero nuclear single quantum coherence (HSQC)- 
type NMR experiments, backbone amide, side chain amide, 
arginine guanido and tryptophan indole 15 N-'H NMR res- 
onances can be monitored. Changes in 15 N and/or 'H res- 
onance frequencies and /or intensities can arise from many 
different aspects of the complex formation, including in- 
terfacial interactions, disorder to order transitions and/or 
occlusion of amide protons from solvent exchange due to 
formation of protein-protein interfaces or ordered structure 
within the TP region. In the case of complex formation be- 
tween the IN CTD and BET ET domains, the TP region of 
the CTD becomes ordered in the complex and CSPs arise 
from both this change in the structure of the TP region and 
the specific interactions of the intermolecular interface. 

Using standard [ 15 N-'H]-HSQC experiments at 600 
MHz, IN CTD-Brd3 ET interactions could be detected at 
pH > 7.0, but not at pH 6.5 that was used to solve the so- 
lution NMR structure of the IN CTD (28). Accordingly, 
backbone 15 N, and 13 C resonance assignments for IN 



CTD, with and without complex formation with the unla- 
beled Brd3 ET domain, were redetermined at pH 7.0 and 
pH 8.0 using standard triple-resonance NMR experiments 
(23). Tryptophan indole NeH resonances appearing in the 
[ 15 NJH]-HSQC spectra were also assigned. These NH res- 
onance assignments are tabulated in Supplementary Table 
SI. 

In transitioning from pH 6.5 to pH 8.0, the HSQC spec- 
tra of the free CTD domain exhibit attenuation of surface 
amide proton intensities due to base-catalyzed amide pro- 
ton exchange. This attenuation of amide proton intensities 
is illustrated in Supplementary Figure SI A and B. However, 
in the spectra of the complex formed between 15 N-enriched 
IN CTD and unlabeled Brd3 ET domains, many of these 
same amide sites do not exhibit attenuation due to solvent- 
exchange broadening. This is because they are occluded due 
to structure formation in the TP region and/or due to the 
interface formed with the ET domain. 

Figure 1 documents the significant NH CSPs, including 
changes in both frequencies and intensities of resonances, 
of IN CTD due to complex formation with Brd3 ET. Over- 
laying the [^N^HJ-HSQC spectra at pH 8.0 for the IN 
CTD in the presence (Figure ID, blue) and absence (Fig- 
ure ID, red) of Brd3 ET demonstrates that binding of the 
Brd3 ET domain protects specific amide resonances from 
solvent-exchange broadening typical of surface amide NH 
resonances at pH 8.0. Significant resonance frequency shifts 
as large as 0.5 ppm are observed for the NH resonances of 
R391, V392, Q393, R394, L399, K400, 1401, R402, L403 
and T404 (Figure ID and Supplementary Figure SI A and 
B). In addition, tryptophan indole NeH resonances for 
residues W347, W369 and W390 appear or shift in the pres- 
ence of Bid 3 ET. Other amide resonances altered in inten- 
sity and/or frequency upon complex formation with Brd3 
ET include residues Q339 and L343 (Figure 1A, in Bl-62 
loop), T359 (Figure 1A, in 62-B3 loop), G365 (Figure 1A, 
in 63-64 loop) and G38 1-383, R387, L388, T389, W390, 
S395, N397, R405, E406 and A407 (Figure 1A, C-terminal 
region). Several of these TP-associated NH resonances are 
observed in the complex but not in the free CTD at pH 8 
(Figure ID and Supplementary Figure SIB). Significantly, 
at pH 6.5, these resonances have amide 'H chemical shifts 
typical of disordered polypeptide segments (i.e., in the range 
of 7.5-8.5 ppm; Supplementary Figure SI A). 

Figure IB and C shows a representation of the sites in 
the IN CTD domain that are perturbed in the presence 
of Brd3 ET mapped onto the 3D structure characterized 
at pH 6.5. The core (3-barrel of the SH3 fold of the IN 
CTD is not affected by complex formation with the Brd3 
ET. Rather, 22 of the 25 non-proline residues whose amide 
resonances exhibit significant CSPs (i.e., either slowed in 
solvent exchange and/or altered in chemical shift) are lo- 
calized in the TP region, C-terminal to the SH3 fold. This 
suggests that the heterodimeric interaction is facilitated pri- 
marily through residues localized in the disordered TP re- 
gion, many of which are conserved in other INs of the gam- 
maretroviral genera (Supplementary Figure SIC). 
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Figure 1. NMR analysis of MLVIN: Brd3 ET interaction. A. MLVIN CTD sequence is displayed with the following color codes: green indicates backbone 
amide resonance chemical shifts that were the same for IN CTD in the presence or absence of Brd3 ET at pH 8.0; red indicates backbone amide resonances 
that are observed in the presence of Brd3 ET, but solvent exchange broadened in the absence of complex formation, and/or amide resonances that exhibit 
frequency shifts upon complex formation; blue indicates backbone amide resonance assignments that could not be determined at pH 8.0 either in the 
presence or absence of the Brd3 ET domain. Residues for which HN amide assignments could not be determined in either free or ET-bound CTD at pH 
8.0 include R337, H338, T340, K341, N342, R346, W347, A367, S385, S386, Q396, as well as proline residues P345, P350, P358, P380, P384, P398 and 
P408 which lack amide protons. B. Ca backbone trace, along with key structural features, of an ensemble of 20 conformers of MLV IN CTD from amino 
acids 329^108 (PDB ID 2M9U) is shown in this panel with the same color codes as described in panel A. C. Ribbon representation of a single MLV IN 
CTD conformer is shown within a transparent view of a surface space fill model. Color code is the same as in panel A with key structural features and 
specific amino-acid residues that show significant CSPs and/or reduced amide proton exchange broadening marked in red. All images were generated 
using PyMol (The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrodinger, LLC.) D. NMR spectrum of the IN CTD-Brd3 ET complex. 
Overlay of [ 15 N _1 H]-HSQC spectra of 15 N-enriched IN CTD construct IN 32 9^io8 at pH 8.0, 300 mM NaCl either with (blue) or without (red) unlabeled 
Brd3ET. The stoichiometric ratio of IN329^tos (1 mM) and Brd3 ET (2 mM) was 1:2 at the concentrations indicated. Backbone amide resonances that 
are not affected by complex formation are labeled with sequence-specific assignments in black; assigned amide resonances that are not observable due to 
solvent-exchange broadening in the absence of ET, but become observable upon complex formation, as well as resonances exhibiting significant CSPs upon 
complex formation are labeled in magenta. All amide peak resonances not observable due to solvent-exchange broadening in the absence of ET, but become 
observable upon complex formation are marked with black circles; some of these could not be unambiguously assigned at pH 8.0. The curved green arrows 
indicate the CSPs due to complex formation of the amide resonances assigned to residues L399, L403, A407 and the side-chain indole NHe resonance of 
W390. Tryptophan W347 and W369 NHe side chain indole resonances with significant proton exchange rate reduction due to complex formation are also 
indicated. Peak resonances labeled in green are assigned to the non-cleavable affinity tag. 
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Figure 2. MLV IN interacts with the BET family through the IN TP. A. 
Rotational correlation time measurements. Plot of rotational correlation 
time (t c ) computed from 15 N T1/T2 relaxation rate measurements versus 
molecular weight. Known monomeric protein standards are indicated in 
red. Data for the three IN CTD constructs (IN329^to8, IN329-385XN and 
IN329-385) are indicated in blue (individually) and green (in the presence 
of the Brd3 ET domain). The molar ratio of the IN CTD constructs and 
Brd3 ET proteins was 1:1 at 100 |jlM. Plots of the 1S N Ti and T2 nuclear 
relaxation data for each sample are presented in Supplementary Figure 
S2. B. Interaction of MLV IN TP with Brd4. GST pull-down experiments 
performed with WT GST-MLV INi^os and IN AC construct GST-MLV 
INi_ 385 with Brd4!_ 7 2o. Coomassie stain of SDS/PAG of GST pull-down 
products. Components of individual reactions are indicated as well as 10% 
of the purified Brd4i_72o input sample. The predicted molecular of the 
GST-MLV IN] _4Qg fusion protein is 69 kDa. Positions of molecular weight 
standards are indicated on the left. 



The absence of the IN CTD TP disrupts the heterodimeric 
interaction 

Having established that the heterodimeric IN CTD-Brd3 
ET interaction is facilitated primarily by residues in the 
TP that are largely disordered in the IN free CTD (28), 
we next explored the impact of truncating these disordered 
residues on complex formation. Molecular rotational cor- 
relation time (t c ) measurements, based on simple ID 15 N 
nuclear relaxation rate measurements, can be used to esti- 
mate molecular mass changes upon complex formation un- 
der precisely defined conditions (25). These data are sum- 
marized in Figure 2A with supporting data in Supplemen- 
tary Figure S2. A calibrated plot of t c versus molecular 
weight (Figure 2A) indicates that at 100 |xM concentra- 
tion, 300 mM NaCl, pH 8.0 and 25°C, MLV CTD con- 
struct IN 32 9^o8 behaves as a monomer (t c = 6.6 ns, MW 
~ 10.5 kDa). Addition of stoichiometric amounts of unla- 
beled Brd3 ET results in a heterodimer (t c = 10.9 ns, MW ~ 
18 kDa). Two MLV IN CTD constructs truncated at the C- 
terminus (AC), IN 32 9_385 and IN 32 9_385 xn were also found 
to behave as monomers in either the presence or absence of 
unlabeled Brd3 ET (i.e., t c = 4.5 - 6.1 ns, MW ~ 8 ± 1 kDa), 



as illustrated in Figure 2A. Under these conditions, removal 
of the IN TP prevents heterodimer formation. 

In order to validate that the truncation of the TP region 
did not affect the 3D structure of the rest of the IN CTD, we 
also recorded [^N-'tTJ-HMQC NMR spectra of the IN AC 
constructs (Supplementary Figure S3). These data demon- 
strate that truncation of the C-terminal disordered TP re- 
gion does not affect the 3D structure of the rest of the 
CTD domain, including the core SH3 fold. The activity of 
IN constructs truncated at residue 385 has been extensively 
characterized in vitro and in vivo (11-14,29-30. Additional 
characterization with respect to viral titer, reverse transcrip- 
tase activity, minus-strand strong stop and plus-strand ex- 
tension, Alu-PCR and two-end integration assays are de- 
scribed in the Supplementary Data. 

These NMR studies of the IN CTD in the presence 
of the Brd3 ET domain provide biophysical evidence of 
the two proteins interacting, with changes in the MLV IN 
CTD NMR spectra clustering within the disordered IN C- 
terminal TP. The interaction with the Brd3 ET domain re- 
quires the C-terminal polypeptide region, and is suppressed 
by deletion of residues 386^408. 

Additional protein interaction studies corroborate the 
role of the TP region in the structural basis of Brd3 ET 
binding. Figure 2B and Supplementary Figure S4 docu- 
ment direct binding between GST-MLV IN 1-40g and full- 
length Brd2, Brd3 and Brd4 constructs. This interaction 
was lost with constructs lacking the CTD TP (e.g., GST- 
MLV INi_385 in Figure 2B and Supplementary Figure S4). 
Yeast two-hybrid studies were also used to analyze the inter- 
action between the mouse Brd2 ET domain and Moloney 
MLV (M-MLV) IN (mTN^g) (Table 1). The interaction 
between the LexA-mIN bait plasmid and the GAL4-ET 
domain prey plasmid in the colony lift assay was readily 
detected, as indicated by the blue color observed in the 
P-galactosidase assays. This interaction was stronger than 
the dimerization between M-MLV IN monomers (pSH2- 
mlN plus pGADNOT-mIN or pACT2-mIN). Deletion of 
the MLV IN C-terminal TP ( P SH2-mIN 1 _ 3 85 XN; IN AC), 
markedly decreased the interaction with the Brd2 ET do- 
main (pACT2-Brd2 ET; Table 1). Negative controls re- 
mained white, other than a very weak background inter- 
action observed between the MLV IN expression construct 
(pGADNOT-5'mIN) and the empty vector (pSH2-l). Pro- 
teins expressing the HIV- 1 p66 and p5 1 reverse transcriptase 
subunits (pSH2-p66 and pGADNOT-p51) served as posi- 
tive controls. 

Taken together, the protein interaction data summarized 
in Figure 2, Table 1 and Supplementary Figures S3 and S4 
unequivocally demonstrate that the TP region of MLV IN, 
including residues 386-408, is critical for interactions with 
the ET domains of Brd2, Brd3 and Brd4 BET family mem- 
bers. 

TP competition assay shows sequence-dependent disruption 
of the heterodimeric interaction 

The importance of the TP region was further validated us- 
ing a 24 amino-acid peptide (WT TP; Figure 3A) in a com- 
petition assay to disrupt the MLV IN CTD and Brd3 ET 
complex. The TP region contains a 39oWX 7 PLK : /lR402 
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Table 1. Interaction between M-MLV IN and Brd2 ET LexA DNA binding domain fusions 
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Qualitative B-galactosidase yeast colony lift assays. Results represent the average colorimetric values from six independent transformation reactions and 
their corresponding 6-galactosidase assays. p51 and p66 represent the components of the HIV-1 reverse transcriptase heterodimer. The LexA DBD bait 
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motif that is conserved across gammaretroviral IN proteins 
(Supplementary Figure SIC). Accordingly, in addition to 
the wild-type (WT) peptide, a second 24-residue mutant TP 
in which all five of the conserved amino acids in the con- 
sensus motif are replaced with alanine (Figure 3A) was also 
studied. 

Rotational time (t c ) measurements, summarized in Sup- 
plementary Table S2 and supported by data in Supplemen- 
tary Figure S5, demonstrate that the 24-residue WT TP dis- 
rupts the heterodimeric IN CTD-Brd3 ET complex. [ 15 N- 
'H]-HSQC spectra presented in Figure 3B further demon- 
strate that the WT TP disrupts the complex formed be- 
tween 15 N-enriched IN CTD (construct IN 32 cmk)8) and Brd3 
ET. Adding the WT TP peptide to the complex of 15 N- 
enriched IN CTD bound to unenriched Brd3 ET dramati- 
cally changes the spectrum, resulting in an [^N^HJ-HSQC 
similar to that of the free full-length IN CTD. The spectrum 
of 15 N-enriched IN CTD in the presence of unenriched ET 
and WT TP is shown superimposed on the spectrum of free 
IN CTD in Figure 3B. The mutant TP, replacing five key 
residues with alanine, does not disrupt the complex between 
IN CTD (construct IN^os) and Brd3 ET (Figure 3C), 
that is, the spectrum of the complex is not altered by adding 
the mutant TP peptide. These data demonstrate the impor- 
tance of some or all of these five residues in the energetics 
of complex formation. 



Truncation of the CTD C-terminal TP results in decreased 
integration at TSS and CpG islands. 

A hallmark of MLV integration is the preferential integra- 
tion within 2 kb of TSS and CpG islands (Figure 4A and B). 
The down regulation and inhibition of BET proteins have 
been shown to influence this integration bias. The BET pro- 
tein inhibitor JQ1 blocks binding of BET proteins to modi- 
fied histones but maintains the interaction of BET proteins 
with IN. Truncation of the MLV IN CTD rather resulted 
in the direct loss of interaction with BET family members 
(Figure 2, Supplementary Figure S4 and Table 1). Map- 
ping the integration site profile for MLV lacking the IN C- 
terminal TP thus tests whether the binding of BET family 
proteins drives MLV integration to TSS and CpG islands 
and presents a direct mechanism to alter the MLV integra- 
tion preference. 



Viruses lacking the C-terminal TP (AC) are viable in tis- 
sue culture and multiple tags have been inserted into the C- 
terminal segment of MLV IN (11,12,30-31). The integra- 
tion target-site distribution of three constructs lacking the 
C-terminal 23 amino acids of IN were examined (INi_ 38 5 xn 
(in6215a) (11)), INi_ 385 8N and INi_ 385 i6h) each differing in 
the non-viral amino-acid sequence tags at their terminus 
(Supplementary Data). Viral titers were within 2-4-fold of 
WT for all three IN AC virus (Supplementary Table S3) and 
cells transfected with all three IN AC constructs were pos- 
itive for reverse transcriptase by the end of the second pas- 
sage of cells (day 8 post transfection; Supplementary Fig- 
ure S6A). Additionally, accumulation of reverse transcrip- 
tion intermediates and the copy number of viral integrants 
of the IN AC constructs were within 2-fold of WT MLV 
(Supplementary Figure S6B and C). All three MLV IN AC 
constructs showed markedly diminished preference for in- 
tegration at TSS (Figure 4A, red arrow) and CpG islands 
(Figure 4B, red arrow) compared to the experimental WT 
MLV IN control and a compilation of published MLV in- 
tegration sites generated by infection of 293T cells (Supple- 
mentary Table S4). For example, integration of IN AC con- 
structs plus or minus 1 kb of TSS averaged ~2.5% of the 
total integrants, whereas ~12% of the WT MLV mapped 
within this interval. Levels of the MLV IN AC constructs 
remained above the published levels for the Foamy virus 
and HIV-1 integrants in 293 and 293T (Figure 4 and Sup- 
plementary Table S4). Loss of localization to TSS and CpG 
islands was not specific for oncogenes; analysis of house- 
keeping genes showed a similar decrease in targeting their 
promoter regions (Supplementary Figure S7). Analysis of 
the local sequence bias at the site of the target DNA du- 
plication revealed that the IN AC constructs maintained 
the characteristic palindromic consensus sequence of MLV 
(Supplementary Figure S8) at the scissile bonds (32). These 
results indicate that the loss of the IN C-terminal TP, which 
is the key IN: BET interaction domain, results in redistribu- 
tion of viral integration sites away from promoter regions 
without grossly altering virus viability (11). 

MLV IN AC integration sites lose association with known 
BET protein binding sites in 293 cells 

In 293 cells, binding sites of BET proteins (Brd2, Brd3 and 
Brd4) have been identified by chromatin immunoprecipita- 
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Figure 3. Mutating the consensus TP sequence inhibits interaction of 
MLV IN CTD with Brd3 ET. A. The 24-residue WT TP sequence is dis- 
played on top and the mutant TP sequence is displayed on bottom. Under- 
lined residues indicate amino-acid residues mutated to alanine. B. Com- 
parison of [ 15 N-'H]-HSQC spectra of the complex formed between 15 N- 
enriched IN CTD (construct IN329jjo8) and unlabeled Brd3 ET in the 
presence of the WT TP (blue) and the free I5 N-enriched IN CTD (con- 
struct IN329^o8> spectra (red; same as Supplementary Figure SIB). The 
stoichiometric ratio of IN329J108 (200 |j.M) and Brd3 ET (200 u,M) was 
1:1 and the peptide was added at 3-fold molar excess (600 u,M). Under 
these conditions, the WT TP disrupts the complex by binding to the ET 
domain, resulting in a spectrum for CTD that is different from that of 
the complex, but essentially identical to that of free IN CTD. C. Com- 
parison of [ 15 N-'H]-HSQC spectra of 15 N-enriched IN CTD (construct 
IN329^o8) and unlabeled Brd3 ET in the presence (red) or absence (blue) 
of the mutant TP. The stoichiometric ratio of IN329Jto8 (200 |xM) and Brd3 
ET (200 u,M) was 1:1 and the mutant peptide was added at 3-fold excess 
(600 u,M). The mutant TP does not disrupt the complex, and the spec- 
trum is not changed when the peptide is added. In both panels, backbone 
amide resonances are labeled in black and peak resonances labeled in ma- 
genta are assigned to the non-cleavable affinity tag. Buffer conditions are 
as follows — buffer 1: 25 mM sodium phosphate pH 8.0, 300 mM NaCl, 
50 mM potassium glutamate, 5 mM 2-mercaptoethanol. Buffer 2: 25 mM 
sodium phosphate pH 8.0, 360 mM NaCl, 60 mM potassium glutamate, 6 
mM 2-mercaptoethanol. 



tion and mapped onto the human genome (19). The correla- 
tion of these BET protein binding sites and integration sites 
of MLV WT IN and IN AC viruses was examined (Figure 
4C). Analysis was performed using the total BET protein 
binding sites (Figure 4C: promoters + within genes + inter- 
genic regions) and as well as those limited to specific pro- 
moter regions (Figure 4D). WT IN integrations correlated 
with identified Brd2, Bid 3 and Brd4 binding sites compared 
to matched random controls throughout the host chromo- 
somes. Integrants obtained from all three IN AC isolates 
showed a marked decrease (~2-fold, P < 0.001) compared 
to the WT IN integrants. Interestingly, this effect was local- 
ized to within 100 bp of a known BET protein-binding site. 
Approximately 9-13% of BET protein binding sites map 
within 2 kb of genes (19). Analysis of the MLV integrants 
from IN AC isolates indicated a more pronounced decrease 
in association with BET protein binding sites (4-fold) lo- 
cated within promoters upon loss of the IN C-terminal TP 
(Figure 4D), mapping to within 100 bp of the identified 
Brd2, Brd3 or Brd4 sites. These results indicate that the loss 
of the IN C-terminal tail results in the loss of targeting to 
identified binding sites of BET proteins. 

DISCUSSION 

The results of this study establish the molecular mechanism 
of MLV retroviral integration into TSS and CpG islands 
by tethering MLV IN with BET family proteins Brd2, Brd3 
and Brd4 (Figure 5). Using solution NMR and biochemi- 
cal studies, the interaction was localized predominantly to 
the C-terminal polypeptide tail segment of the MLV IN 
protein, including residues 386-408. In the absence of the 
BET proteins, the MLV IN C-terminal peptide (TP) is un- 
structured. It becomes structured upon complex formation 
with the Brd3 ET domain. This disorder-to-order transi- 
tion may contribute a significant entropic component to the 
energetics of complex formation. The TP is non-essential 
for virus viability; however, viruses lacking the TP show 
marked diminution of integrations at the TSS and CpG is- 
lands, which are favored targets of MLV DNA integration 
in chromosomes. Loss of the MLV IN TP also correlated 
with the loss of association with BET protein binding sites 
within 293 cells. 

Backbone CSPs upon complex formation indicate a 
change in the environment of the corresponding backbone 
atoms. Such changes implicate the corresponding region of 
the polypeptide or protein in the molecular recognition pro- 
cess, but do not necessarily demonstrate a direct contact of 
the corresponding residue in the intermolecular interface; 
that is, CSPs do not necessarily indicate direct intermolec- 
ular interactions. Site directed mutations could be used to 
validate the role of specific atoms in interfacial interactions, 
but the interpretation of such data requires structural data 
demonstrating that such sequence changes do not disrupt 
the structure of the CTD or the CTD-ET complex distal 
from the site of mutation. In any case, the extensive CSP 
data for free and ET-bound IN CTD, summarized in Fig- 
ure 1 and Supplementary Table SI, provide unequivocal ev- 
idence for structural changes in C-terminal TP region of IN 
CTD upon complex formation, providing a structural basis 
for the interaction between these two domains. 
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Figure 4. MLV IN AC integrations lose association with TSS, CpG islands and known BET protein binding sites in 293 cells. A. Percentages of integration 
sites in 293 cells are plotted with respect to the distance from the annotated TSS compared to matched random controls (MRCs) (50). B. Percentages of 
integration sites are plotted with respect to the distance from the 5' end of the nearest CpG islands compared to MRCs. C. Total BET binding sites in 293 
cells. Integration sites were measured for its proximity to the 5' boundary of known BET protein binding sites compared to MRCs. MRCs are selected 
randomly from the host genome respective to a restriction site and should have no relation to chromatin sites. D. BET protein chromatin binding sites that 
overlap with promoters as defined in LeRoy et al. ((19); additional file 8). Loss of association of the IN AC versus the WT IN within 100 bp of chromatin 
sites bound by BET proteins was statistically different with P < 0.001 for both the total BET sites and those localized to specific promoters, respectively. 
Square bracket denotes inclusion of the limit, while parenthesis denotes exclusion. 



The C-terminus of MLV IN is non-essential for IN enzy- 
matic activity in vitro (13) and for virus viability in tissue cul- 
ture (12,31), however second-site revertants have been iso- 
lated from modified IN AC virus where the IN C-terminal 
tail was restored (31). The MLV IN TP overlaps with the 
Env signal peptide in an alternative reading frame. The se- 
quence conservation within the IN TP/4070A amphotropic 



Env overlap region displays a bias toward maintenance of 
the IN reading frame (33). This suggests that the C-terminal 
segment of IN may have a functional role in vivo. Align- 
ment of retroviral IN proteins with known targeting to 
TSS and CpG islands show conservation of the sequence 
39oWRVQRSQN PLK IRLTR 4 o S located at the MLV IN C- 
terminus (Supplementary Figure SIC). This homology ex- 
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Figure 5. Model for MLV integration. A. Assembly of BET proteins on acetylated histone tail (blue) in the presence of additional host factors and RNA 
Pol II (together marked as X) to TSS. MLV IN interacts with the ET domain of BET proteins through the IN TP (red) resulting in a preponderance of 
integrations near TSS. B. Loss of MLV IN TP results in loss of association with BET proteins and results in decreased targeting to BET binding sites and 
TSS. 



tended to the core consensus of 390WX7PLKJR402 (Figure 
SIC; J = I/L) in RaLV, GaLV and KoRV, but the inte- 
gration site preferences for these viruses have not been an- 
alyzed. It is predicted that these would be critical for the 
interaction with BET family members. Indeed, the peptide 
competition assay (Figure 3 and Supplementary Table S2) 
shows that one or more of residues W390, P398, L399, K400 
and R402 contribute to the energetics of complex forma- 
tion. Additionally, IN W390A has been shown to have re- 
duced binding affinity for Brd4 ET (10). 

The function of the IN C-terminal tail appears analo- 
gous to that of the histone (34-35) and RNA polymerase 
II tails (36), as an otherwise unstructured docking site for 
additional proteins or regulatory factors (Figure 5). Inter- 
estingly, the NS1 protein of influenza A H3N2 subtype is 
proposed to use a related histone mimic as a means to asso- 
ciate with the human PAF1 transcription elongation com- 
plex and thus inhibit the antiviral response (37). 

Targeting of IN integration to active promoter regions 
is beneficial for expression of the viral genome for subse- 
quent rounds of infection. However, targeting to promoter 
regions only accounts for approximately 25% of all MLV 
retroviral integrations. Loss of targeting through the BET 
proteins does not affect the majority of MLV integration 
events. The local sequence selection at the site of integration 
(LOGOs analysis) remain unchanged in the absence of the 
IN TP. The quantitation of integration copy numbers of the 
three replicating IN AC isolates was statistically similar to 
WT IN and consistent with the 2-fold variation observed for 
a single round of infection (12). Additionally, reverse tran- 
scription of the viral genome into dsDNA is not grossly af- 
fected by the absence of the IN TP (Supplementary Figure 
S6B and C). Variations in viral titers (between 2- and 4-fold) 
and the time course of viral spread were observed between 



the three IN AC isolates. These assays depend on the level 
of expression of the viral or transgene mRNA. Loss of tar- 
geting to highly expressed promoter regions can result in a 
decrease in the total level of viral mRNA produced and thus 
be reflected in minor variations in viral titer and subsequent 
proviral integrations. Alternatively, the differences in the IN 
terminal amino acids may affect the association with other 
unidentified host factors. 

Recently, it was reported that three residues in the MLV 
IN CCD are important in the interaction with BET proteins 
(9). It is unclear whether these mutations compromise MLV 
IN catalytic activities or indirectly affect interactions with 
BET proteins. Despite an intact TP region, interaction with 
BET proteins was lost in the context of these CCD muta- 
tions. It is possible that the presence of a C-terminal FLAG 
tag used in that study might compromise the binding po- 
tential of the TP region. Loss of the IN TP though did not 
compromise IN catalytic activities within the CCD or those 
requiring multimerization including two-end concerted in- 
tegration (Supplementary Figure S9). 

Interestingly, BET family members have been implicated 
in other viral systems and used to regulate viral transcrip- 
tion as well as tethering the viral DNA complexes to mi- 
totic chromosomes. Both the KSHV LANA protein and 
the Merkel cell polyomavirus T antigen bind Brd4 through 
protein-protein interactions with the ET domain (38-40). 
In addition, various animal and human papilloma viruses 
interact with the host protein Brd4 (41). For MLV, integra- 
tion at BET protein binding sites positions the virus within 
transcriptionally active regions of the host chromosomes 
and should facilitate viral gene expression. 

Multiple retroviral and retrotransposon systems utilize 
host proteins to influence their position of viral integration. 
Integration of HIV- 1 is directed within genes through asso- 
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ciation with the host factor LEDGF/p75 (22,42-44). Yeast 
Tyl and Ty3 target integration into tRNA genes. Ty3 inter- 
acts with Brf 1 , TFIIB and TBP, which are involved in Pol III 
transcription (45). Recent studies of Tyl targeting to tRNA 
genes implicate a possible histone modification near Pol III 
transcription sites as a driving factor for recognition (46). 
For Tf 1 , it is known that a chromodomain at the C-terminus 
interacts with the host Atf 1 protein to drive integration into 
genes associated with environmental stresses (47). 

The function of the ET domain of BET family members 
is not well defined. Recent studies have identified NSD3, 
JMJD6, CHD4, GLTSCR1 and ATAD5 proteins as bind- 
ing partners to the BET family members ET domain (48) 
and thus the ET domain functions to recruit specific effector 
proteins to regulate transcriptional activity. It is not known 
if binding of the MLV IN TP interferes with the function of 
the ET domain and/or the association of cellular proteins 
to this domain. The solution structure of the Brd4 ET do- 
main consists of three a-helices plus a loop structure, with 
no close structural homologs (49). Binding to IN is local- 
ized to the acidic patch in the oi2-a3 loop region (9-10). 
The solution structure of the ET domain:IN CTD complex 
will be interesting to elucidate. 

Mapping of WT integrated MLV proviruses shows fa- 
vored targeting within 2 kb of the TSS and CpG islands (3). 
For the WT IN, correlation with the chromatin binding sites 
of BET proteins showed a tighter association, favoring in- 
tegration within 100 bp of the defined BET protein binding 
sites for Brd2, 3 and 4. The two analyses measure distinct 
features, as the positions of the BET sites with respect to 
the TSS are not defined. Loss of the IN TP did not affect 
a panel of additional features, including integration in gene 
dense regions (50). MLV integrants are reported to be as- 
sociated with the supermarkers (51) containing H3K4mel, 
H3K4me3, H3K9ac chromatin marks and STAT1 binding 
sites. Indeed, comparison of the integration sites of the WT 
IN versus the IN AC constructs from this study with the 
epigenetic maps of HeLa and resting T cells correlated the 
loss of the IN TP with the loss of integrants at H3K4mel, 
H3K4me3, H3K9ac chromatin marks, STAT1 binding sites 
and H2AZ sites (data not shown). Understanding the addi- 
tional proteins that assemble with the preintegration:BET 
complexes will shed light on the molecular mechanisms 
associated with recognizing these supermarkers as well as 
those that stabilize gene or promoter-specific targets. This 
knowledge has direct implications for understanding the 
potential of murine-based vectors for gene activation result- 
ing from insertional viral mutagenesis. 

Eliminating the interactions with BET family members 
has implications for the use of gammaretroviral vectors for 
gene therapy and gene delivery. Foamy viruses, which have 
limited bias to integrate at TSS and CpG islands have been 
shown to be apathogenic in humans (52). Vectors lack- 
ing the IN TP would decrease targeting to promoter re- 
gions of all classes of genes, not specifically oncogenes. This 
nonetheless has the potential to decrease the probability of 
promoter /enhancer insertional mutagenesis due to viral in- 
tegration at promoter regions. The modulation of overall 
expression of the transgene due to chromosomal positional 
effects might become more evident in the absence of BET 
protein targeting. Current studies are aimed at defining the 



potential of oncogene activation resulting from integration 
of MLV bearing the IN AC proteins. 
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